Editorial: Humans over machines: The New York Times seeks to protect journalism in suing OpenAI and Microsoft

The New York Times Building is seen in New York City on February 4, 2021. (Photo by Daniel SLIM / AFP)
Subscribe Now Choose a package that suits your preferences.
Start Free Account Get access to 7 premium stories every month for FREE!
Already a Subscriber? Current print subscriber? Activate your complimentary Digital account.

The New York Times is not content to let OpenAI and Microsoft get rich using the newspaper’s web content for artificial intelligence like ChatGPT without paying and sued this week in Manhattan federal court.

We don’t know if the pun on the word “content” is particularly funny or if an AI robot would be inclined to such wordplay, but we do know it was written by humans, us, the people behind the words and pictures of the Daily News.

And having people, not machines, tell the human stories of the city and the world for other humans to read should stay that way.

We could have asked ChatGPT to write an editorial about how it is bad that ChatGPT lifted wholesale without paying from a newspaper to teach itself how to replace newspapers. However, while AI doesn’t get tired of gimmicks, real people do.

The Times alleges in its lawsuit that ChatGPT was fed huge numbers of articles produced by the paper’s website to allow the program to learn using the “large language model.” As the complaint states, “an LLM works by predicting words that are likely to follow a given string of text based on the potentially billions of examples used to train it.”

For teaching material, “the training set was comprised of 45 terabytes of data — the equivalent of a Microsoft Word document that is over 3.7 billion pages long.” Not all of that was from the Times, but the work of that newspaper and website, like the works of this newspaper and website, are copyrighted and can’t just be lifted.

When OpenAI began it was a nonprofit, sharing with all its creations for the betterment of humankind. But the humans in charge didn’t need AI to learn about the profit motive and they changed to a proprietary system to sell. No more sharing from them and customers have to pay. But they get to borrow and not pay?

They could have used non-copyrighted works for their classroom, like the collected plays, poems and sonnets of Shakespeare. That would be perfectly legal, but ChatGPT would then write and speak like an Englishman of 400 years ago. Such sterner stuff is Greek to me in this brave new world and not a wild-goose chase melted into thin air. Would star-crossed lovers be a tower of strength or get short shrift?

Or they could have borrowed without any limits from Mark Twain, who during his lifetime fought furiously for laws for copyright protections. But Twain died in 1910 and all his creations are now free to use for everyone. Under U.S. law, current copyright protection lasts for the life of the author plus 70 more years.

Other works have a 95-year shield, like Walt Disney’s 1928 “Steamboat Willie,” the first appearance of Mickey Mouse, which will enter the public domain on Monday, Jan. 1. Sound recordings are protected for 100 years.

Craigslist, Google and Facebook vacuumed up revenue from the press, devouring classifieds, display ads and circulation. Now another internet invention, AI, is threatening the press itself. As the lawsuit says, “if the Times and other news organizations cannot produce and protect their independent journalism, there will be a vacuum that no computer or artificial intelligence can fill.” And yes, we meant to play on the word “vacuum.”